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BOUNDS IN NORMAL APPROXIMATION 

By Larry Goldstein 

University of Southern California 

The zero bias distribution W of W, defined though the charac- 
terizing equation EWf(W) = a'^Ef'{W*) for all smooth functions /, 
exists for all W with mean zero and finite variance a^. For W and 
W* defined on the same probability space, the distance between 
F, the distribution function of W with EW — and Var(W') — 1, and 
the cumulative standard normal $ has the simple upper bound 

\\F ~ii>\\i<2E\W* -W\. 

This inequality is used to provide explicit bounds with moderate- 
sized constants for independent sums, projections of cone measure on 
the sphere S{£^), simple random sampling and combinatorial central 
limit theorems. 

1. Introduction. The zero bias transformation and its use in Stein's 
method [21] for normal approximation was introduced in [10]. There, it was 
shown that for any mean zero random variable W with finite variance o"^, 
there exists W* which satisfies 

(1) EWf{W) = a^Ef'iW*) 

for all absolutely continuous / with £'|1/F/(H^)| < oo. We say that such a 
W* has the W-zero biased distribution. Study of the zero bias distribution 
was motivated by the size bias transformation and Stein's characterization 
of the normal (see, e.g., [22]), which shows that Z~AA(0,(T^) if and only if 

(2) EZfiZ)=a^Ef'iZ) 

for all absolutely continuous / with E\Z f{Z)\ < oo. 

It is helpful to consider the transformation characterized by (1) as a map- 
ping W — > W* whose domain is the collection of all mean zero distributions 



Received January 2006; revised July 2006. 

AMS 2000 subject classifications. 60F05, 60F25, 60D05, 60C05. 

Key words and phrases. Stein's method, Berry-Esseen, cone measure, sampling, com- 
binatorial CLT. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Probability, 
2007, Vol. 35, No. 5, 1888-1930. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



L. GOLDSTEIN 



with variance cr^. From Stein's characterization (2), it is immediate that this 
transformation has as its unique fixed point the mean zero normal distri- 
bution with variance cr^. It seems natural, then, that an approximate fixed 
point of the transformation would be approximately normal and that we can 
measure the distance of the distribution of W to the normal by the distance 
between W and its zero bias version W* . 

Here, we consider the distance between distribution functions F and 
G given by 

/oo 
\F{t)-G{t)\dt 
-oo 

and known by many names, including Gini's measure of discrepancy, the 
Kantarovich metric (see [19]), as well as the Wasserstein, Dudley and Fortet- 
Mourier distance (see, e.g., [3]). If F is the distribution function of a mean 
zero, variance 1 random variable W and F* is that of W* having the W-zero 
biased distribution, Lemma 2.1 of [8] yields that 

(4) ||F-^>||i<2||F*-F||i, 

where ^> is the cumulative distribution function of the standard normal. To 
bound the right-hand side of (4), it can be convenient to use the dual form 
of the distance (see [19]) given by 

(5) ||F-G||i =inf^|X-y|, 

where the infimum is over all couplings of X and y on a joint space with 
marginal distributions F and G, respectively. Since the dual representation 
(5) says that ||-F* — -F||i is upper bounded by — W\ for any coupling 

of W and W* , the following result is immediate. 

Theorem 1.1. Let W be a mean zero, variance 1 random variable with 
distribution function F and let W* have the W-zero biased distribution and 
be defined on the same space as W . Then, with $ the cumulative distribution 
function of the standard normal, 

||F-$||i <2E\W* -W\. 

The goal of this work is to apply Theorem 1.1 to obtain bounds to the 
normal for a variety of examples and to express the resulting upper bounds as 
a third-moment-type quantity multiplied by an explicit, moderate constant; 
in particular, we study sums of independent variables, projections of cone 
measure, simple random sampling and combinatorial central limit theorems. 

In Section 2, we begin by considering the case where Y = Y^^=i is the 
sum of independent mean zero random variables with finite variances af = 
Var(l^), not only to illustrate the method, but also to take advantage of the 
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fact that the particularly simple construction of Y* in this case allows for the 
computation of constants in the bound which are explicit functions of the 
summand distribution. In particular, letting / be an independent random 
index with distribution 

(6) P(/ = .) = ^.^, 

the argument proving part (v) of Lemma 2.1 in [10] shows that removing Yj 
and replacing it by a variable Yj having the Yj-zeio bias distribution, inde- 
pendent of {Yj,j 7^ /}, gives a variable Y* with the Y-zero bias distribution, 
that is, that 

(7) Y* =Y-Yi + Yj* 

has the Y-zero biased distribution. We apply this construction and Theorem 
1.1 to derive Theorem 2.1 and Corollary 2.1, which yields, for example, that 
if F is the distribution function of W = n~^^'^J27=i sum of n i.i.d. 

variables with the uniform distribution standardized to have mean zero and 
variance 1, then 

||F-$||i < ^^i^ for all ?i = 1,2,..., 

that is, we obtain a Berry-Esseen type bound, using the metric, with a 
constant of 1/3. 

In Section 3 we present two constructions of the zero bias distribution Y* 
for Y = ^ j Yi which can be used in the presence of dependence. Both of these 
constructions are related to the one used for size biasing which is reviewed 
in Section 3.1. The first zero bias construction, presented in Section 3.2, can 
be applied to random vectors Y E R" which are coordinate symmetric (also 
called unconditional), that is, vectors for which 

(8) {Yi,...,Yn)=d{eiYi,...,enYn) for all (ei, . . . , e„) G {-1, 1}" . 

The second construction of Y* , presented in Section 3.3, depends on the 
existence of an exchangeable pair (Y' ,Y") as in Stein [23], whose compo- 
nents have marginal distribution equal to that of Y, and which satisfies the 
linearity condition 

(9) E{Y"\Y') = {1- \)Y' for some AG (0,1). 

This construction appeared in [10] and was applied in [9] to obtain supremum 
norm bounds in normal approximation. 

The zero bias construction given in Section 3.2 is used in Section 4 to 
obtain bounds for the normal approximation for one-dimensional projections 
of the form 

(10) Y = ex, 
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where for some p> 0, the vector X G R" has cone measure Cp and 9 G 
is of unit length. To define Cp, let 



(11) 



5(£;j) = |xGR":^|xi|f =l| and 



n 



i?(^P= xeR":}_J|x,r <1 

I i=l 



Then, with Lebesgue measure in R", the cone measure of A C S{£p) is 
given by 

(12) c;{A) = i^M^^ where [0,l]A = {ta:aeA,0<t<l}. 

Theorem 4.1 provides a normal bound for the projection Y in (10) in terms 
of explicit and moderate constants and the quantity J27=i depending on 
the projection 0. Cone measure, for p = 1 and p = 2, respectively, includes 
the special cases of the uniform distribution over the simplex J27=i l^il = 1 
and the Euclidean sphere J27=iXi = 1 in R""- For these two special cases 
and for F the standardized distribution function of the projection (10), 
Theorem 4.1 specializes to, respectively. 



(13) 



\F - Ml < —Y \9if + and 



9 " 



\F - ni < ^Y.\(^^f + 



for = n^i/2(l,...,l), the sums in (13) are replaced by n ' . 

In Section 5, we turn our attention to simple random sampling of subsets 
of size n from a set ^ of numerical characteristics, where each subset is se- 
lected uniformly, that is, with probability (^) . The zero bias construction 
in Section 3.3 is applied in Theorem 5.1 to yield, under some basic non- 
triviality conditions, the following bound to normality for the distribution 
function F of the standardized sum of the characteristics in the sample, 

r'i I I /A !\ — 11,1 \ I II. ' ^ 



4a3 / n(iV - ra) \ / n V 
cjS \N{N -1))V^ n) 



where 

a is the average of the elements in A and o"^ is the variance of the sum of 
the sampled characteristics, whose explicit form is given in (70). 
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In Section 6, we study the accuracy of the normal approximation in the 
combinatorial central limit theorem. In particular, we apply the zero bias 
construction in Section 3.3 to variables of the form 



(14) Y = J2, 



1 = 1 



for n a positive integer, {aij}i<jj<n the elements of a matrix A £ R"^", 
and vr a uniformly chosen random permutation on 5„, the symmetric group. 
Theorem 6.1 yields, for the distribution function F of the standardized vari- 
able Y in (14), 

i^-1> i<7 \^(lQ + -, ^ + 7 ^ , 

(n — 1)0"'^ V (n — 1) [n — ly J 

where 

n 

03 = ^ \aij - ai. - a.j + a..\^, 

ai.,a.j and a., are the averages of a^j over j,i and both i and j, respectively, 
and cr^ is the variance of Y, whose explicit form is given in (88). When 
the elements of the population A or the matrix A behave "typically," the 
bounds provided by Theorems 5.1 and 6.1 will be of the best order, n~^/^. 

The zero bias transformation was introduced in [10] to provide smooth 
function bounds of order 1/n for simple random sampling, and the coupling 
given here in Section 5 for that case is related to the one used there. In [9], 
the zero bias transformation is used to obtain bounds on the supremum, or 
distance, between the distribution of the sum Y in (14) and the normal, 
in terms of the maximum of aij; the coupling construction of W to W* 
in Section 6 of this paper was first given there. Here, the distance is 
used and the form of the bounds improved, in that they are expressed in 
terms of third-moment-type quantities. Also, in [9], supremum norm bounds, 
again in terms of the maximum of a^, were computed for Y when tt has 
a distribution constant on cycle type. The bound (4) was first shown in [8] 
and applied there to derive the rate of convergence to the normal for 
hierarchical sequences Xi , X2 , . . . of random variables whose distributions 
for some k>l and / : R satisfy 

Xn+l = f{Xn,l, ■ ■ ■ ,Xn,k), n>l, 

where Xn^i, . . . ,Xn^k are i.i.d. with distribution equal to that of X„. 
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2. Independent variables. In this section, we demonstrate the apphca- 
tion of Theorem 1.1 and the construction (7) to produce bounds with 
small explicit constants for the distance of the distribution of sums of in- 
dependent variables to the normal. The utility of Theorem 2.1 below is re- 
flected by the fact that the distance on the left-hand side of (16) requires 
computation of a convolution, but is bounded on the right by terms which 
require only the calculation of integrals of the form (3) involving marginal 
distributions. 

The proof of Theorem 2.1 requires the following simple proposition. The 
first claim is stated in (iii), Section 2.3 of [19]; the second is well known 
and follows immediately from the dual form (5) of the distance. For H a 
distribution function on R, let 

H~^{u) =suj){x:H{x) <u} for u G (0, 1) 

and let U(a,b) denote the uniform distribution on {a,b). 

Proposition 2.1. For F and G distribution functions and U ~Z//(0, 1), 
we have 

\\F - G\\i = E\F~'^{U) - G~^{U)\. 

Further, for any a > and 6 G R, where Fa^b o,nd Ga,b o-Te the distribution 
functions of aX + b and aY + b, respectively, 

\\Fa,b — G'a,6||l = a\\F — 

Note that one consequence of the proposition is that the distance, as 
the infimum in (5), can always be achieved. In what follows, we will find it 
convenient to express relations like the second claim in Proposition 2.1 in a 
notation where the random variable replaces its distribution function, thus, 
||aX-ay||i = a||X-y||i. 

Theorem 2.1. Let Xi,i = l,...,n, be independent mean zero random 
variables with variances af = Var(Xj) satisfying Ylll=i — 1; ^'^'^ 

n 

W = Y,Xi. 

1=1 

Then for F the distribution function of W and <I> that of the standard nor- 
mal, 

(15) \\F -<^>\\i<2E\X^ - Xi\, 

where X^ is any variable having the Xi-zero biased distribution, indepen- 
dent of {Xj,j i}, i = 1, . . . ,n, and I is a random index, independent of 
{Xi,X*,i = 1, . . . , n}, with distribution P{I = i) =crf . 
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Letting Gi and G* be the distribution functions of Xi and X* , respectively, 
we have 

n 

(16) \\F-nl<'^J2^^\\G*-G^\\l■ 

1=1 

In particular, when W = n"^/^ J2 -^i f^f X, Xi, . . . , Xn i.i.d. with mean zero, 
variance 1 and distribution function G, 

(17) ||F-«>||i < -G||i, 



and G* , the distribution function of X* , may be given explicitly by 
(18) G*{x) = E[X{X -x)liX <x)]. 

Proof. The coupling (7) yields W* - W = Xj - Xj, with / having 
distribution as in (6), so (15) follows immediately from Theorem 1.1. 
Now, let Ui,i = 1, . . . ,n, be a collection of i.i.d. U{0, 1) variables and set 

{X,,X:) = {GT\Ui), {G*r\Ui)), i = 1, . . . ,n; 

by Proposition 2.1, we have 

Averaging the right-hand side of (15) over / then yields (16) by 

n n 

$||i <2^|X| -X/l = 2^cj2s|X; -Xil = 2^cj2||G* -Gilli. 

i=l i=l 

When the variables are i.i.d., af = 1/n, and using Proposition 2.1, the 
bound becomes 

n 2 

- GiWi = 2\\G\i^ - G^i^Wi = - 
i=l v"- 

proving (17). 

It is shown in [10] that for X with mean zero and variance 1, the distri- 
bution function G* of X* is absolutely continuous with respect to Lebesgue 
measure with density p*(a;) = —E[X\{X < x)]. Hence, the distribution func- 
tion of X* is 

G*{x) = -e(^X j"" l{X<u)dT^ 

= -e(^X J'' dul{X <x)^ = E[X{X - x)1{X <x)]. □ 
Applying (17) and (18) in particular cases leads to the following corollary. 
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Corollary 2.1. Let Bi 



,Br, be i.i.d. Bernoulli variables with suc- 



cess probability p G (0, 1), g = 1 — p and Xi = {Bi —p)/^/pq- Then for the dis- 
tribution function F of the sum W = X^iLi having the standardized 
binomial B{n,p) distribution, for every n = 1, 2, . . . , 



< 



p^ + 
Jnpq 



E\Xi\ 



n 



[noting that E\Xi\^ = {p^ -\- q^) / .^/pq]. 

For F the distribution function of the sum W = Y^^=i ofUi,...,Un, 
i.i.d. variables with the mean zero, variance 1 uniform distribution L{[—^/3, \/3], 
for every n = 1,2, . . . , 



\F-<^>\ 



< 



\/3 E\Xi[' 



(noting that E\Xi\^ = 3^3/4;. 

// X is any mean zero, variance erf random variable with distribution 
function G and Z has the AA(0, crl) distribution and is independent of X , 
then when o"^ +(T2 = 1, the distribution function F of the variance 1 sum 
W = X -\- Z satisfies 

\\F -^\i<2al\\G* -G\\i. 



Proof. For X = {B -p)l ^/pq^ by (18), we have 



G*{x) 



pq ( p 

X + 



for X G 



-p 



pq ^pq 



that is, X* is equal in distribution to {U — p)/ ^Jpq, where \J ^IA^,\\. Hence, 
by Proposition 2.1, 



|G* -G| 



U — p B — p 



pq 



UU-B\U = P^ 



pq 



and the claim now follows by (17) of Theorem 2.1. 
For the uniform distribution U[—^/3,V3], (18) yields 

G*(x) = -^ + ^ + i forxG[-V3,^/3]. 
Now applying (3), we obtain 



|G* - Gill 



The final claim of the corollary follows from (16) with n = 2 and the fact 
that the normal is a fixed point of the zero bias transformation. □ 
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Corollary 2.1 yields constants 1 and 1/3 for the standardized Bernoulli, 
and the Uniform, respectively. Though it is perhaps of greater interest that 
such constants can be computed explicitly as a function of the underlying 
distribution, the following proposition gives a bound for the nonidentically 
distributed case in terms of a universal constant ci, which can be shown to 
be at most 3. In particular, let 

, , 2E\X*-X\ 
(19) ci=sup ^|^|3 , 

where the supremum is taken over all X with EX = 0, EX^ = 1, E\X\^ < oo 
and E\X* — X\ = \\X* — X\\i, that is, with X* achieving the minimal 
coupling to X. 

Proposition 2.2. For F the distribution function of any variance 1 
sum W = J2i=i -^i independent mean zero variables Xi,i = 1, . . . ,n, 

n 

\\F -<i>\\i<ci^E\Xi\^, whereci<3. 
1=1 

Proof. Let X have mean zero, variance 1, and finite absolute third 
moment, and let X* be any variable on the same space as X , having the X- 
zero bias distribution. Applying (1), with f(x) = (l/2)x^ sgn(x), for which 
f'{x) = \x\ , yields 

E\X*\ = lE\Xf. 

By the triangle inequality and Holder's inequality, using EX'^ = 1 to bound 
E\X\ by E\X\^, we have 

E\X* - X\< E\X*\ + E\X\ < ^E\Xf + E\Xf = ^E\Xf, 

yielding ci < 3. 

Dropping the requirement that EX^ = 1 in (19), by scaling we have 

2Yav{X)E\X* - X\ 



(20) ci = sup ■ 



E\X\3 



where the supremum is taken over all X with EX = 0,0< EX"^ < oo, E\X\^ < 
oo and X* achieving the minimal coupling to X. 

Now, with {Xi,X*) achieving the minimal coupling for i = l,...,n, 
(16) and (20) yield 

n 

||F-$||i < 2^afE\X* - Xi\ 

i=l 

-|:( "'^gi3^-> i^.i-s^.pi^.i- 
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□ 



Finally, we remark that as the supremum in (19) is taken over a class of 
random variables determined by two constraints, the content of [13] and [15] 
suggests that it may be attained on a three-point distribution. 

3. Coupling constructions. In this section, we present two constructions 
which may be used to obtain a variable Y* having the Y-zero bias distri- 
bution in the presence of dependence. The first applies when y is a sum 
of the components of a coordinate-symmetric vector defined in (8); the sec- 
ond construction uses the exchangeable pair {Y',Y") of Stein satisfying the 
linearity condition (9), which first appeared in [10]. We begin by review- 
ing the construction for size biasing as presented in [11], as both zero bias 
constructions below are related to it. 



3.1. Size biasing. The zero bias characterization (1) is similar to, and, 
indeed, was motivated in [10] by, the characterization of the size biased 
distribution Y^ for a nonnegative random variable Y with finite expectation 

(21) EYf{Y)=fiEf{Yn, 

holding for all functions / for which E\Yf{Y)\ < oo. Under the nontriviality 
condition P{Y = 0) < 1 or, equivalently, the condition > 0, the character- 
ization (21) is easily seen to be the same as the more common specification 
of the size bias distribution F^{y) as the one which is absolutely continuous 
with respect to the distribution F(y) of Y with Radon-Nikodym derivative 

^ ^ dF(y)-/.- 

For the construction of Y^ when Y = J2i^i> the sum of the components of 
a vector Y of nonnegative dependent variables with finite means m = EYi , 
following [11], we note that for every i = 1, . . . ,n, there exists a distribution 
Y^*) such that for all functions / : R" — > R for which the expectation on the 
left-hand side exists, 

(23) EF,/(Y)=/i,i?/(Y«); 

we say that Y*-*) has the Y-size biased distribution in direction i. By special- 
izing (23) to the case where / depends only on 1^, we recover (21), showing 

that Y^^^ =d Y^ , that is, that the ith component of Y^*^ has the 5^-size bias 
distribution. 

Without loss of generality, by removing any trivial components of Y for 
which = and lowering the dimension of Y accordingly, we may express 
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(23) in the language of (22): denoting the distribution of Y as F{y), the 
distribution F^'^^y) of Y^'^ is given by 

(24) dF^^)(^y) = yidF{y), 

that is, Y(*) is absolutely continuous with respect to Y, with Radon-Nikodym 
derivative Ui/fii- Now, as shown in [11], choosing an independent index 
/ G {1, . . . ,n} proportional to the mean of the components of Y, that is, 
according to the distribution (6), where af is replaced by /ij, the variable 

n 

(25) y^ = ^y/) 

has the y-size biased distribution. 

Hence, by randomization over /, a construction of Y* for every i leads 
to one for Y^. We may accomplish the former as follows. Write the joint 
distribution of Y as a product of the marginal distribution of Yi times the 
conditional distribution of the remaining variables given Yi, 

(26) dF{y) = dFi{yi) dF{yi, . . . ,yi_i,yi+i . . . ,yn\yi), 
which gives a factorization of (24) as 

(iF«(y) = dF^'\y,)dFiyi, . . .,y^-i,yi+i ■■■,yn\yi), 

(27) 

where dF['\yi) = ^ dF,{yi). 

Comparing the relation in (27) between the marginal distributions F-{yi) 
and Fi{yi) with (22) provides an alternate way of seeing that l^*-*^ =ciY(' . 
The representation (27) says that one may form Y^*) by first generating 

(i) 

Y^ having the l^-sized biased distribution, and then the remaining variables 
from their original distribution, conditioned on y^ taking on its newly chosen 
sized biased value. For Y already given, a coupling between Y and Y^ can 
be generated by constructing Y^ and then "adjusting" as necessary the 
remaining variables Yj, j ^ i, so that these have the conditional distribution 
given Yi taking on its new value. Typically, the goal is to adjust the variables 
as little as possible in order to make the resulting bounds to normality small; 
see [9] and [11] for examples. 

In the case where Yi, i = 1, . . . ,n, are independent, clearly YJ =d Yj for 
all j^i- Hence, the construction given above reduces to simply choosing 
one summand at random with probability proportional to its expectation 
and replacing it with its biased version. We note that in both zero and 
size biasing, a finite sum Y = Yi of independent variables is biased by 
choosing at random and then replacing the randomly chosen variable by 
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a biased version; in size biasing, the variable is chosen proportional to its 
expectation and in zero biasing, to its variance. The zero bias transformation 
was so named due to its similarity to size biasing and its application to mean 
zero random variables. 

3.2. Coordinate symmetric variables. Of the two zero bias constructions 
presented here, the one for coordinate symmetric random vectors Y € R" 
as defined in (8) is closest to the size biasing construction just described. To 
begin, note that for all Y such that EY'^ < co, by replacing the variable Y on 
the left-hand side of (21) by Y"^, we can define the square bias distribution 
Y^ of Y by the characterization 

EY^f{Y) = EY'^Ef{Y^) 

for all functions / for which the expectation of the left-hand side exists. 
Naturally, when Y has mean zero and variance o"^, this identity becomes 

(28) EY^f{Y)=a^Ef{Y^). 

To make an extension analogous to the one from (21) to (23) for size 
biasing, let the components of Y G R" have mean zero and finite variances 
Var(yi) = af. For such Y, for all i = 1, . . . ,n, there exists a distribution 
Y* such that for all functions / : R"^ — > R for which the expectation of the 
left-hand side exists, 

(29) EY^J{Y) = afEf{Y'); 

we say that Y' has the Y-square biased distribution in direction i. By 
specializing (29) to the case where / depends only on YJ, we recover (28), 
showing that Y^ =dYf, that is, that the ith component of Y* has the Yi- 
square bias distribution. 

By removing any component of Y which is constant and lowering the 
dimension accordingly, we can assume, without any loss of generality, that 
each component is nontrivial, that is, that af > for every i = 1, . . . ,n. 
Parallel to (24) in the case of size biasing, we may now equivalently specify 
the Y* distribution as the one which is absolutely continuous with respect 
to Y, with 

(30) dF\y) = 4dF{y). 

Now, let Y be coordinate-symmetric as defined in (8). Applying (8) 
marginally and pairwise yields Yi =d —Yi and {Yi,Yj) =d {—Yi,Yj) for all 
i and distinct i,j, respectively. Hence, when all components of Y have finite 
second moments, taking the following expectation using these distributional 
equalities yields 



(31) EYi = for ah i and EYiYj = for all i / j. 
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Proposition 3.1 shows how to construct the zero bias distribution Y* for the 
sum Y of the components of a coordinate-symmetric vector in terms of Y* 
and a random index in a way that parallels the construction for size biasing 
given in (25). We let l/([a,b] denote the uniform distribution on [a,b]. 

Proposition 3.1. Let Y G R" be a coordinate- symmetric vector as in 
(8), with Yav{Yi) = af G (0, oo) for all i = 1,2, ... ,n and 



Let Y', z = 1, . . . ,n, have the ^''squared bias" distribution given in (29), / be 
a random index independent of Y and {Y*, i = 1, . . . , n} with distribution 



has the Y -zero bias distribution. 

Proof. Let / be an absolutely continuous function with E\Y f{Y)\ < 
oo. Averaging over the index /, integrating out the uniform variable U and 
then applying (29) and (8) to obtain the fourth equality and fifth equalities 
below, respectively, we have 



n 



(32) P{L = i) = 




and U ~^[— 1,1] be independent of all other variables. Then 



(33) Y* = UY/ + J2 Yj 





Hey. 



i=l 



i=l 



n 



n 




) 




EYfiY). 
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Thus, Y* has the Y-zeio bias distribution. □ 

The construction for zero biasing impHcit in Proposition 3.1 is parahel to 
the one given in Section 3.1 for size biasing. The factorization (26) suggests 
that we write (30) as 

dF'(y) = dF^{yi) dF{yi, . . . ,yi_i,yj+i . . . ,y.„|?/i) 

(34) 

where dFl{yi) = — 2 ! 

the relation given in (34) between the marginal distributions F-{yi) and 
Fi{yi) provides an alternate way of seeing that =dY^. As for the size 
biasing construction in Section 3.1, given Y, Proposition 3.1 and (34) now 
give a coupling between Y and Y* , where an index I = i is chosen with 
weight proportional to the variance af, the summand Yi is replaced by Y-^ 
having that summand's "square bias" distribution and then multiplied by 
U and, finally, the remaining variables are adjusted according to their orig- 
inal distribution, given that the ith variable takes on the value Y^. This 
construction will be applied in Section 4. 

3.3. Use of the exchangeable pair. Let y be a mean zero random variable 
with finite, nonzero variance. The following description of a coupling of Y 
to a Y* having the y-zero biased distribution appears in [10]; its simple 
proof and some of the consequences below needed for the constructions in 
Sections 5 and 6 appear in [9]. 

Proposition 3.2. Let Y',Y" be an exchangeable pair with Var(y') = 
o"^ G (0,00) and distribution F{y',y") which satisfies the linearity condition 
(9). Then 

(35) EY' = and E(Y' - Y"f = 2Xa'^ , 
and when Y"^ ,Y^ have distribution 

(36) dFHy',y") = ^^^;^^,dFiy',y"), 

and U ^^^[0,1] is independent ofY'^jY^, the variable 

Y* = UY'f + (1 - U)Y^ has the Y' -zero biased distribution. 

The following construction of Y'^ , Y^ is in the same spirit as the ones given 
in Sections 3.1 and 3.2. Given Y\ first construct Y" close to y', such that 
(y',y") is exchangeable and satisfies (9), and use it to form the difference 
y — Y" . Then, perhaps independently, construct the parts of y^^,y-'- which 
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depend on the "square biased" term (Y' — Y")'^. Finally, construct the re- 
maining parts of y^^,y^ by adjusting the corresponding parts of Y',Y" to 
have their original joint distribution, given the newly generated variables. 

We can describe the constructions used in Sections 5 and 6 in a bit more 
detail, where the pair Y' , Y" is a function of some collection of underlying 
random variables {^Q!,a € X} and an index IG X, possibly random but 
independent of {^Q,a € X}, and the difference Y' — Y" depends only on 
{£,a,ci £ I}, that is, for some collection of functions 6i(.^a,Q; G i), 

(37) Y' -Y" = bi{Ca,aeI). 

Since one may first generate I, then {^a,0! G I}, and finally {^a,a E F} 
conditional on G I}, we may write the joint distribution of all of the 

variables as 

(38) dF{i, ^^,aeX) = P{I = i) dFii^a, aei) dF;c\^{i^,a i ijCa, a G i). 
Now, consider the distribution F"^, which is F-square biased by (y' — y")^: 

(39) dFt(i, a G ^) = J^^r_^^r)2 ^^^(i'^a' « ^ X). 
Using (35) and (37), we obtain 

IXa" = E{Y' - Y" f = Ebj{Ca,a G I) = ^ P(I = i)F6?(^„,a G i), 

SO, in particular, we may define a distribution for an index with values in 
subsets of X by 

^'(I^ = i) = ^ with n = P(I = i)Eb?{U a G i). 

Hence, substituting (37) and (38) into (39), 
dF^{i,^a,aeX) 

_P{I = i)b?{^a,aei) 



(40) 



^^-2 dFi{(,a,a£ i) dFic|i(^a, « ^ « S i) 

f. 6?(£ Q! G i) 
TTT^ '2fT TTT c?^i(^a, a S i) dFic\i{Ca,a i i|£„, a G i) 

P(lt = i) dFl[i^,a G i) a!Fic|i(£„, a ^ i|£„, a G i). 



where 



dF\{i^,a G i) = Mga,«e i) ^^.^^^^ ^ g 
F6f(4«,aGi) 

giving a rcprGScntation of (i-P'^(i,^Q-, a £ X) parallel to the one for dF{i, £q,, a G 
^Y) in (38). This parallel representation gives a parallel construction as well: 
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first generate I"'", then {Ca,a S i} according to dF^ and finally, {^o,a ^ i} 
according to (iF;c|;(^Q,, a ^ i|^Q,,a G i). 

For the two examples in Sections 5 and 6, the index I is uniform over 
some range, so by (40), over that same range, and {S,a,a G i} are jointly 
drawn from the distribution with proportionality 

(41) dFl^ii, ^a,aei)r^b?{^a,aei) dFi(^„, a E i). 

With I and {£,a,C( G X} given, the coupling proceeds by generating I"!" and 
{^a,a G I^} according to (41), then adjusting the remaining given variables. 
For making the bounds small, the goal is to make changes as little as possible, 
so that the zero biased variable is close to the original. 

In Section 5, this procedure results in S, a function of the variables which 
can be kept fixed throughout the construction, and variables T^T"^ and T'^ 
on a joint space such that 

(42) Y' = S + T', Y^ = S + T^ and Y^ = S + T\ 
and hence 

\Y* - Y'\ = iC/rt + (1 - U)T^ - T'\. 

Here, the underlying variables {^a, « G X} are {X' ,X" , X2, ■ . ■ , Xn} and the 
difference Y' — Y" = X' — X" so that I is nonrandom, that is, it indexes the 
variables X',X" with probability 1, and b{X',X")=X' - X" . 

In Section 6, {7r(z), z S {1, . . . , n}} play the role of {^a, a G X}, I = {/, J} 
is uniform over all pair of distinct indices in {l,...,n} and the difference 
Y' — Y" is given by 

(43) ^{i,j}(vr(A;),A; G {i,j}) = (ai_^(j) + a^- ^(j)) - (aj^^(j) + a^- ^(j)). 

Note that even when I is uniformly distributed, the index need not be; in 
particular, the distribution (94) given by (41) with h\ = [as in (43)] se- 
lects the indices = {/^, J^} jointly with their "biased permutation" images 
{i^^,L^} with probability that preferentially makes the squared difference 
large. We return to the exchangeable pair construction in Sections 5 and 6. 

4. Projections of cone measure on the sphere S{i'^). In this section, 
we use the zero biasing construction in Section 3.2 to derive Theorem 4.1, 
providing bounds to normality for projections • X, where X G R" has cone 
measure on the sphere S{lp), defined in (12) and (11), respectively, and 
6 G has unit length. The resulting bound (55) is in terms of explicit 
small constants [see also (13)] and depends on 6 through the factor Yl,i 
which yields the best possible rate of n~^/^ when the components of 6 are 
equal. 
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In the case p = 2, cone measure is uniform on the surface of the unit 
Euchdean sphere in R" and [7] shows that the A:-dimensional projections of 
X are close to normal in total variation. The authors of [16] derive normal 
approximation bounds using Stein's method for random vectors with sym- 
metries in general, including coordinate symmetry, considering the supre- 
mum and total variation norm. Studying here the specific instance of cone 
measure allows for the sharpening of general results to this particular case. 

Cone measure is uniform on S{ip) only in the cases p = l and p = 2, and 
the authors of [18] obtain a total variation bound between cone and uniform 
measure for p > 1. In some sense, then, the contribution here is related 
to the central limit problem for convex bodies which strives to quantify 
when projections of uniform measure on high-dimensional convex bodies 
have some one-dimensional projection close to normal. A large body of work 
in this area is generally concerned with the measure of the set of directions 
on the unit sphere which give rise to approximately normally distributed 
projections and do not provide bounds in terms of specific projections; see, 
in particular, [1] and [5] for work continuing that of [24]. In principle, the 
techniques developed here can be used to shed light on aspects of the central 
limit theorem for convex bodies; see the remarks at the end of this section. 

Let X € be an exchangeable coordinate-symmetric random vector 
with components having finite second moments and let £ R" have unit 
length. Then, by (31), the projection of X along the direction 6, 



has mean zero and variance equal to the common variance of the com- 
ponents of X. To form Y* using the construction outlined in Section 3.2, 
as seen in (34) in particular, requires a vector of random variables to be 
"adjusted" according to their original distribution, conditional on one co- 
ordinate taking on a newly chosen, biased, value. Random vectors which 
have the "scaling conditional" property in Definition 4.1 can easily be so 
adjusted. Let C{V) and C{V\X = x) denote the distribution of V, and the 
conditional distribution of V given X = x, respectively. 

Definition 4.1. Let X = {Xi , . . . , X„) be an exchangeable random vec- 
tor and P C R the support of the distribution of Xi . If there exists a function 
g-.V^R such that P(g(Xi) = 0) = and 



n 



Y = Y,e,x, 



I) 



i=l 



(44) £(X2,...,X„|Xi = a 




for all a G 2? 



then we say that X is scaling (^-conditional or, more simply, scaling-conditional. 
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Proposition 4.1 is an application of Theorem 1.1 to projections of scaling- 
conditional vectors. 

Proposition 4.1. Le< X G R" be an exchangeable, coordinate- symmetric 
and scaling g-conditional random vector with finite second moments and, 
with 6 € R" of unit length, set 

n 

Y = Y,diXi, a^ = Vsii{Y) and F{x) = P{Y / a < x). 
1=1 

Then any construction of (X, X^) on a joint space for each i = 1, . . . ,n with 
XI having the Xi-square biased distribution provides the upper bound 



(45) ||F-^>||i<-S 



ej{uxj-Xr) + ('^-i)j:e,x, 



where P{I = i) = 0f, U ^U[—l,l] and I and U are independent of each 
other and of the remaining variables. 

Proof. For all i = l,...,n, since X is scaling (^-conditional, given X 
and XI, the vector 

^^ _ g{Xi) g{Xl) g(Xt) 

has the X-square bias distribution in direction i as given in (29); in par- 
ticular, for every h for which the expectation on the left-hand side below 
exists, 

(46) EXjhpC) = EXfEh{X.'). 

We now apply Proposition 3.1 to Y = {9iXi, . . . , OnXn)- First, the coor- 
dinate symmetry of Y follows from that of X. Next, we claim 

Y* = {9iX\, . . .,9nXU 

has the Y-square bias distribution in direction i. Given /, applying (46) 
with 

h{X)=f{9iXu...,enXn) = f{Y) 

and then multiplying both sides by 9f yields 

E9fx!f(Y) = E9fxfEfiY') or EY^f{Y) = EY^Ef{Y'). 

Finally, since X is exchangeable, the variance of Yi is proportional to 9f and 
the distribution of / in (32) specializes to the one claimed. 
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Now, (33) of Proposition 3.1 yields, with Y* having the y-zero biased 
distribution, 

n 

Y*-Y = UY} + E^/ - 

n 

= ueiX\ + Y^e,x]-Y^e,Xi 

i^i i=i 
= OiiUX'i -Xi) + Y, %(X/ - X,) 



ej{ux\-x,) + Y^e,{'j^^-i)x 



ejiuxj-x,) + {^^-i)j:o,x,. 



The proof is completed by dividing both sides by a, noting that Y* /a = 
(Y/a)*, and invoking Theorem 1.1. □ 

Proposition 4.2 shows that Proposition 4.1 can be applied when X has 
cone measure. We denote the Gamma and Beta distributions with param- 
eters a, (3 as r(a,/3) and B{a,(3), respectively, and the Gamma function at 
X by r(x). 

Proposition 4.2. Let denote cone measure as given in (12) for 
some p> 0. 

1. Cone measure Cp is exchangeable and coordinate-symmetric. For 
y^j^j — 1,...,?^} independent variables with Gj ~r(l/p, 1) and ej 
taking values —1 and +1 with equal probability, 

(47) X=(^ei(^^j (^^j j~Cp«, where Ga,, = Y^G, 



I- 

i=a 



2. The common marginal distribution Xi of cone measure is characterized 
by 

Xi=a-X, and ~ S(l/p, (n - l)/p), 

and the variance (T^p = Var(Xj) is given by 

2 r(3/p)r(n/p) 2/p 2 r(3/p) 

(48) (7„ „ = , ^, , , satisfying lim n '^a„ „ = ^ ,^ , , . 

V ; n,p Y{l/p)T{{n + 2)/p) ''^ ^ n~^oo "'P r(l/p) 
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3. The square bias distribution XI of Xi is characterized by 

(49) Xl=d-Xl and ~ B(3/p, (n - l)/p). 

In particular, letting {Gj,G'j,ej,j = l,...,n} be independent variables 
with Gj ~ T(l/p, 1) , G'j ~ r(2/p, 1) and £j taking values —1 and +1 with 
equal probability, for each i = 1, . . . ,n, a construction of (X.,Xl) on a 
joint space is given by the representation of^ in (47) along with 



(50) 



Gi + G' 



(51) 



The mean m^^p = -E'l-'^i | for all i = 1, . . . ,n is given by 

r(4/p)r((n + 2)/p) 



T{3/p)T{{n + 3)/p) 

and satisfies 

l/(pVl) 

\3/p) ^''^ """'^-^ + 
4. Cone measure C" is scaling (1 — Ixl^y^P -conditional. 



(52) lim n^^^mn r, = (ind nin » < 

^ ^ n^oo "'P r(3/p 



The proof of Proposition 4.2 is deferred to the end of this section. Before 
proceeding to Theorem 4.1, we remind the reader of the fohowing known 
facts about the Gamma and Beta distributions; see [4], Theorem 1.2.3 for 
the case n = 2 of the first claim, the extension to general n and the following 
claim being straightforward. For 7^ ~ T{ai, 1), i = 1, . . . , n, independent and 
oii > 0, 

(53) 71 +72 ~ r(ai + 02, 1), — — ~i3(ai,a2), and 

71+72 



71 



i=i It Lj=i 7i 



and ^7i are independent; 



i=l 



the Beta distribution B{a,P) has density 
(54) 



r(a + /3) a-iri \/3-ii 



r(a + K)r(a + /3) 

and K > U moments , , , 

r(a + /3 + K)r(a) 
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Theorem 4.1. Let X have cone measure Cp on the sphere S{ip) for 
some p> and let 

n 
i=l 

he the one- dimensional projection of X along the direction 6 G R" with 
||0|| = 1. Then with cr^p = Var(Xi) and rrin^p = -E'l^il given in (48) and 
(51), respectively, and F the distribution function of the normalized sum 
W = Y/an,p, 

(55) \\F-ni< 3(^) E + (- V l) 

V <yn,p J fr[ \p J n + 2 

where ^ is the cumulative distribution function of the standard normal. 

We note that by the limits in (48) and (52), the constant mn^p/an,p that 
multiphes the sum in the bound (55) is of the order of a constant, with 
asymptotic value 



lim HhML = r(4/p)Vr(iM 
(Tn,p r(3/p)3/2 

Since, for 6 G R" with \\6\\ = 1, we have 



the second term in (55) is always of smaller order than the first, so the decay 
rate of the bound to zero is determined by l^jP- The minimal rate \pn 
is achieved when 0j = 1/ \/n. 

In the special cases p = 1 and p = 2, Cp is uniform on the simplex 

Z)r=i 1^*1 — 1 ^^^'^ ^^it Euclidean sphere X^iLi^? — respectively. By 
(48) and (51) for p=l. 



2 , 3 



n{n +1) ' n + 2 ' 



and, using (52) for p = 2, 



1 



0-^2 = - and ?n„,2 < , „, 
these relations yield 



^ = 3,/^^^^<A and ^<J^<V3. 
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Substituting into (55) now gives the claim (13). 

Proof of Theorem 4.1. Using Proposition 4.2, we apply Proposition 
4.1 for X with g{x) = (1 — |x|p)"'^/p and the joint construction of (X,Xj-) 
given in item 3. 

Using the triangle inequality on (45) yields the upper bound 



(56) 



{E\ei{UXj-Xi)\+E 



For X with the common marginal of X, we have 



E 



X 



cr. 



n,p 



< E 



X 



a. 



n,p 



2\ 1/2 



l< E 



X 



a. 



n,p 



3\ 1/3 



< E 



X 



'•n,p 



which, with X^ having the square bias distribution of X, implies that 

E\X\^ 



E\X\ < 



■E\X'-\. 



cr, 



n,p 



Bounding the first term in (56) by applying the triangle inequality, using 
the fact that U is independent of / and Xj, E\U\ = 1/2 and P{I = i) = 9^ 
yields 



E\ei\{\uxj\ + \xj\) = ^|^/|(i|x/| + \Xi\) = EJ2 + \Xi 



(57) 



i=l 

n 



— IE l^il^-^l-'^i I — i"^n,pX/ 



1=1 



i=l 



Now, averaging the second term in (56) over the distribution of / yields 



(58) 



E 



9{Xi) 



1 E^.-^.- 



Using (47), (50) and g{x) = (1 — |x|^) 'p, we have 



i=l 

p\l/p 



(59) 



9{Xt 



l/p 



ijE^.-^i 



1. 



The variable G[ and, by (53), the sum Gi^n are independent of Xi, . . . 
hence, the term (59) is independent of the sum it multiplies in (58) and 
therefore equals 



(60) 



E^ 

i=l 



9{Xi 



9{Xi 



1 



E 



^^jXj 
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To bound the first expectation in (60), since (Gi^„ + G9 ~ B{n/p,2/p), 
we have 



(61) 



E 



1 



E\ 1 



G 



l/p\ /I 

< I - V 1 

p 



n + 2 



since for p > 1, using (54) with k = 1, 



^ 1 



Gl^n 



1/p 



< E 1 



n/p 



{n + 2)/p n + 2' 

while for < p < 1, using Jensen's inequality and the fact that (1 — x)^/^ > 
1 — x/p for X <\, 



E[ 1 



Gin 



Gin + G'- 



i/p 



< 1 



1 



Gin 



Gl^n + 



1/p 



n 



n + 2 



1/p 



< 



p{n + 2) 



We may bound the second expectation in (60) by an,p since 

2 / \ 2 



E 



Neither this bound nor the bound (61) depends on i, so substituting them 
into (60) and summing over i, again using X^i^-? = 1) yields 



(62) 



E^ 

i=l 



g{x^) 



1 



E 



E 



V 1 



n + 2 



Adding (57) and (62) and multiplying by 2/cj„,p in accordance with (45) 
yields (55). □ 

Proof of Proposition 4.2. 

1. For A C 5'(£p), e = (ei, . . . , e„) G { — 1, 1}" and a permutation vr G 5„, let 

Ae = {x:(eiJ;i,...,e„,x„,) G^} and = {x : (x^(i), . . . , a;^(„)) G ^}. 

By the properties of Lebsegue measure, /i"([0,l]Ae) = /u"([0,l]A^) = 
//"■([0, 1]^), so by (12), cone measure is coordinate symmetric and ex- 
changeable. 

Next, [20], for instance, shows that 



(63) 



1|, . . . , \Xn 



Gi \ 



G 



l,n 



Gn 

Gi, 
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Letting C and Ce be the distribution functions of X ~ Cp and (eiXi, . . . , 
e„X„), respectively, the coordinate symmetry of X imphes that 

C7(x) = C7e(x) for aU eG {-1,1}", 

so averaging yields 

ee{-l,l}" 

Therefore, for ej, i = l,...,n, i.i.d. variables taking the values 1 and 
— 1 with probability 1/2, we conclude that X =rf (eiXi, . . . , =rf 
. . . ,e„|X„|). Combining this fact with (63) yields (47). 

2. Applying the coordinate symmetry of X coordinatewise gives Xi =d —Xi 
and (63) yields \XiY' = Gi/Gi^m which has the claimed Beta distribution, 
by (53). As EXi = 0, we have 

(64) Yav{X,) = EXf = E{\Xi\Pf/P 

and the variance claim in (48) follows from (54) for a = 1/p, /3 = (n — l)/p 
and K = 2 /p. The limit in (48) follows from the fact that for all n,x > 0, 

(65 lim — — ^ = 1, 

n^oo r(n + X) 

which can be shown using Stirling's formula. 

3. If X is symmetric with variance and X^ has the X-square bias density, 
then for all odd functions /, since —X'^f{X) =dX'^f{X), 

, -I, EX^f(-X) E(-X^f(X)) E(X^f(X)) , 1, 

Ef{-X^) = ^ = = ^ = Ef{X^), 

cr^ (T^ (7^ 

showing that X^ is symmetric. 

From (54) and a change of variables, X satisfies \X\p ~ B{a/p,(3/p) if 
and only if the density p\x\iu) of |X| is 

(66) P\^\^^^ = ria/p)r{p/pf (^-"^ ^-[°'^]- 

Hence, since \Xi\P ~ B{l/p, {n — l)/p) by item 2, the density p\Xi\{'^) of 
\Xi\ is 

^i^'i^"^ = r(i/p)r((n-i)/p) ^^-") 

Multiplying by and renormalizing produces the density 

. ._ ^V|(n) 

(67) 

- ^^«^ + ^)/^) .^(l-.^)^^-)/-^^,,,!], 



r(3/p)r((n - i)/p) 
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and comparing (67) to (66) shows the second claim in (49). The repre- 
sentation (50) now follows from (53) and the symmetry of X^. 

As in (64), the moment formula (51) follows from (54) for a = 3/p,(3 = 
(n — l)/p and k = 1/p, and the limit in (52) follows by (65). Regarding 
the last claim in (52), for p>l, Holder's inequality gives 



m. 



"n,p ■ 



E\X^\ < {E\X^\Py/P -- 



i/p 



while for < p < 1 , we have 



m 



n,p " 



E\X' 



E 



Gin + G'j 



<E 



n + 2 

Gj + G'i 
Gi,n + G'J n + 2 



4. We consider the conditional distribution on the left-hand side of (44) 
and use the representation (and notation Ga^b) given in (47). The second 
equality below follows from the coordinate symmetry of X, and the fourth 
follows since we may replace Gi^n by G2,n/{^ — |op) on the conditioning 
event. Further, using the notation aC{V) for the distribution of aV, we 
have 



C{X2, . . . ,Xn\Xi - 

G2 



c(e2 



i/p 



(68) 



C[e2 
(1 



Gl^n 

G2_ 

Gl^n 

G2_ 

Gl^n 

a?YlPC 



i/p 



i/p 



£2 



G2 



G \ ^/"P 



Gl^n 
Gn 



\ i/p 

Gl^nJ 

G \ ^Iv 

1/p 



Gl^n, 

Gi \ ^Ip 



G\^n 
G2,n 



il-\a\P)^/Pc(e2 



il-\a\P)^/Pc(e2 



G2,n, 

G2 yip 

G2,n 

G2_ 

G2,n 



G \ 



1/p 



7 



7 



G2,n 

Gn 
G2,n 

Gn 
G2,n 



G2,n 



1/p 



1/p 



Gl;n 

Gi 



Gl^n 



71-1 



= g{a)C; 

In the penultimate step, we remove the conditioning on Gi/Gi^n since 
(53) and the independence of Gi from all other variables gives that 

" ^ is independent of {Gi,G2n) 



Go.n ' ' G 



2,n 
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and so, in particular, is independent of Gi/{Gi + G2,n) = Gi/Gi^n- 

Regarding the right-hand side of (44), using 1 — \Xi\'p = Yll=2 1^*1^ 
the representation (47), we obtain 

g{a){X2,...,Xn)/g{Xr) = g{a)^ (^2,...,X„) 



9{K 



9[a, 



{\X2\P + --- + \Xn\Py/P, 

( {e2{G2/Gi,nY'P, en{Gn/Gi,ny/P) \ 
\ {{G2/Gi,n) + --- + {Gn/Gi,n)y/P J 

f {£2Gy^ , . . . ,enGn^) \ 
\ {G2 + --- + GnYlP ) 

G2 \ ^'^ f Gn\ ^'"^ 



9y.a)\e2\-^\ ,...,£„, 

=d9{a)C;-\ 

matching the distribution (68). □ 

In principle. Proposition 3.1 can be applied in conjunction with Theorem 
1.1 for any coordinate-symmetric vector where one can construct a cou- 
pling between the marginal variables and their square biased versions, and 
where conditional distributions such as the one on the left-hand side of (44) 
can be handled. For X having the uniform distribution over a convex body 
symmetric to the coordinate planes, the conditional distributions of interest 
are uniform over the intersection of the body with the hyperplanes Xi = a. 
The marginal coupling appears to be more elusive, but may be especially 
tractable when the body has some particular shapes. 

5. Simple random sampling. We provide an bound for the error in 
the normal approximation of the sum 



n 

(69) y=T.^^ 

i=l 

of a simple random sample of size n from a set ^ of real numbers, not 
all equal. It is straightforward to verify that Y has mean and variance o"^ 
given by 

/^,^^ - , 2 n(N-n) v^, _,2 , _ 1 

(70) n = na and a = ——— r > (a — a] where a= — > a. 

^ ' ^ N{N -1) ^ ^ ' N ^ 

The bound below depends also on 03, the third-moment-type quantity given 
by 

aeA 
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Theorem 5.1. Let {Xi, . . . ,Xn} be a simple random sample of size n 
from a set A of N real numbers, not all equal, with n and N satisfying 

(71) 2<n<N-l. 

Then, with the sum Y given by (69), the distribution function F of the 
standardized variable W = (Y — fi)/a satisfies 

" cjS \n{n-i)Jv^nJ 

Using n/N < 1, we see that the theorem provides the "universal" upper 
bound 1603/0"^, although if the samphng fraction n/N is close to 1/2, the 
bound improves substantially, close to l.lbas/a"^ . 

Since W and a^/a^ are invariant upon replacing a by {a — a) / J J^beAi^ ~ '^)^ 



we may assume below, without loss of generality, that the collection A sat- 
isfies 

(72) H" = and ^ = 1. 

If we consider a sequence of collections of N numbers, not all equal, then 
the bound will be of (the best) order 1 / \/iV as ^ 00 if the deviations a — a, 
a£ An are comparable and the sampling fraction n/N is bounded away from 
and 1; in particular, under (72), will be of order 1, the deviations of 
order \/\fN and 03 (and therefore the bound) of order 1/y/N. 

Proof. By (72), 

2 n{N -n) I i3 

a = — ; and 03 = > a , 

so it suffices to prove that 

(73, |,.-.|,,,i^(,,^)'. 

Since distinct labels may be appended to the elements of A^ say as a 
second coordinate which is neglected when taking sums, we may assume 
that the members of A are distinct. In addition, and for convenience only, we 
consider all samples from A as though drawn sequentially, that is, obtained 
with order. 

Inequality (71) is imposed so that various expressions have simpler forms 
[see, e.g., (84)], in order to leave at least one unsampled individual with 
which to form an exchangeable pair, and also to yield 

(74) A= / £(0,1). 

n(iV — n) 
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To form an exchangeable pair, let X' , X" ,X2, ■ ■ ■ , X^ be a simple random 
sample of size n + 1 from A, that is, with distribution 

P{X' =x',X" = x",X2 = X2,...,Xn = Xn) 

= N^^-^l{{x' ,x" ,X2, . ■ ■ ,Xn} C A, distinct), 
where = N\/{N — k)], the falling factorial. The pair 

n n 

Y' = X' + Y,Xi and Y" = X" + Y^ Xi 

i=2 i=2 

is clearly exchangeable with common marginal distribution that of y in (69). 
Since 

E{X'\Y') = -Y' and E{X"\Y') = —Y', 

n N — n 

with A as in (74), we have 

E{Y"\Y') = E{Y' -X' + X"\Y') = (1 - \)Y' , 

proving that linearity condition (9) is satisfied. 

We now follow the construction of the zero bias variable outlined in 
Section 3.3. Since Y' - Y" = X' - X\ choose X\X^ independently of 
X', X", . . . , X„, and with distribution proportional to the squared dif- 
ference (y' - y")^ = {X' — X")'^, that is, according to the distribution 

(75) g(«,^)=(^_il!i({a,6}c^). 

Now, the remainder of the sample from which we will construct y^ and Y^^ 

must have the conditional distribution of X2 , ■ ■ ■ , Xn given 

X^, that is, it must be a simple random sample of size n — 1 from ^\ 

{X^,Xt}. 

However, we would like these n — 1 variables to correspond as closely as 
possible to the values in {X2, ■ ■ ■ , Xn} - For this reason, consider the difference 
and intersection 

y = {X2,...,Xn}\{X\X^ and TZ' = {X2, . . . , x^} n {x\ x^. 

The difference set =5^ contains the variables in our original sample which can 
be used in the sample taken according to the conditional distribution given 
the inclusion of X^ and X^, and TZ' contains the variables which cannot 
be common to both samples, that is, variables which must be replaced by 
others when forming Y^ and Y^. In particular, if the intersection TZ' is empty, 
then {X2, ■ ■ ■ ,Xn} serves as the size n — 1 simple random sample from the 
complement of {X^^,^-'-}. Otherwise, TZ' is of size 1 or 2 and variables in TZ' , 
in the order given by their indices, are replaced by those in a set 7^t, of the 
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same size as TZ' , obtained by taking a simple random sample from the values 
available, that is, from the complement of 

Q = {X2,...,Xn}U{X\X^}. 

In each case, the total resulting collection of the n — 1 variables thus 
obtained are uniform from ^\ {X^^jX-f}, that is, they have the conditional 
distribution of X2, . . . , X„ given X' = X\X" = X^; hence, (42) holds with 

5 = ^ a, 

T'=Y,a + X\ T"=J2a + X", 
r1"=^a + Xl" and T*=^a + X*. 

With U ^ V([0,1] independent of all other variables, by Proposition 3.2, a 
coupling of the zero biased variable Y* and Y' is given by 

Y* = UX^ + (1 - U)X^ + S+ J2 a and Y' = X' + S + J2 a, 

and therefore their difference V is given by 

V = Y* -Y' = UX^ + (1 - U)X^ -X' + J2 « - J2 ^■ 

Now, using X'^ =^ X"^ and the independence of U , we may bound E\V\ by 



(76) 


E\V\ <E\X^\+E\X'\ + E 




+ E 















We bound the four terms of (76) separately. 
Since E{X')^ = 1/N, we have 

E\^/nX'\ < E{VNX'f = 1 < {E\VNX'f)^^^ < E\VNX'f, 
which gives the following bound on the second term of (76): 

^- > \a\=l^J\X-\ = ^=±J\VI\X'\<- 



(77) 



N 



El 



^ eiVnx'i < -^eiVnx'i^ 



N 



EK 

aeA 



as- 



From (75), the marginal distribution of X^ equals 



for a £ A. 



9i(«) = 7T(a^ + ^ 



Therefore, for the first term in (76), using (77), we have 

(78) E\X^\ = Y: \a\qi{a) = i ^ |a|3 + _L ^ |a| < a^. 



aeA 



aeA 
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Moving to the last term in (76), since {X2, ■ ■ ■ and X'^ ^X"^ are inde- 
pendent, for any a G ^, 

P{a £ n') =P{a£{X2,..., X„} n {X^X^}) 

= P{a G {X2, . . .,Xn})P{a G {X^X^}) 



= 2P{ae{X2,...,Xn})P{X^ = a) 
which imphes that 
E 



n — 1 
N 



N 



(79) 



<EJ2 |a| = H |a|P(aG7^') 

aG7^' aeA 



n — 1 



N 



n — 1 

N 



1 



2n 
iV 



03, 



using (77). 

Beginning in a similar way for the third term in (76), since P(|7^^|g{0, 1,2})=! 
and P{a G 7^^ {11^=0) = for all a, we have 



E 



(80) 



ae7^t 



< J2 |a|^(ae^^) 



: ^ |a|P(aG7^^|7^"^| = l)+ ^ |a|P(aG7^^|7^"^| =2). 

a67et aG7^t 



By independence, the joint distribution of {X2, ■ ■ ■ ,Xn) and X"^ ,X^, whose 
realizations are denoted Xn-i and u,v, respectively, is given by 

(81) p{xn-i,u,v) = {N)~^^l{{x2,...,Xn} CA, distinct)g('u, t;) , 

with q[u,v) as in (75). Without further mention we consider only the event 
of probability one where Xn-i is composed of distinct elements and u^v. 
Although Xn-i is ordered, with a slight abuse of notation, we treat Xn-\ as 
an unordered set in expressions containing set operations, such as Xn-i H 
\u^v\. Taking B to be an ordered subset of A of size 1 or 2, the conditional 
distribution that 7^^ = 5, given Xn-i and u,u, is uniform over all sets the 
size of the intersection of Xn-i and f, taken from the complement of their 
union, that is, 



p(B\xn-\,u,v) 



1 



{X - \Xn-\^ {U,V]\)\B\ 

X \{B n (xn-i u {n, v}) = 0, 1^1 = Ixn-i n {u, w}|). 
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In particular, then, for B of size 1, using (81), we have 

P{aen\\n^\ = l) = P{0'\Xn-l,U,v)p{Xn~l,U,v) 

Xn-l,U,V 

= 2 p{a\xn-i,u,v)p{xn-i,u,v) 
1 1 

= ^ M — iTr\ — ^("'^) 

N — n{N)n-i 
uexn-iMxn-i ^ ' 



2 \ 1 



rs9^ 2(n-l)(iV- 3)^-2 ^ , . 

Y ^(^'^) 

{u,v}^a 

(83) = ^^"^ t:; 

where, in (82), the factor (A^ — 3)„_2 counts the number of ways that the 
n — 2 additional elements required in Xn-i can be taken from the — 3 
available and the n — 1 counts the number of positions that u could occupy 
in the ordered set Xn-i- In addition, in the last equality, we have used 



(AT 


-n){N)n-i 


2(n- 


l)(Af-n + l) 






2(n- 


l){N-n + l) 


m3 



2\ 



Eq(u,v) = —- (u — v)"^ = —- (u^ — iuv+v 

r w <^-, 

= — {u^ — uv) = — — — y^ uv 

N ^ ' N 

-^^-"^ 

Dropping the — term in (83) to get an upper bound and using (77) 
and the fact that > n > 3, we have the following upper bound on the first 
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term in (80): 

(84) 

2(n-l)(iV-n + l) 2n 
N{N-2) - iV 

To handle the second term in (80), we have, hkewise, for a and b distinct, 
P{T^'^ = {a,b)) = P(.0',b\xn-i,u,v)p{xn-i,u,v) 

({it,v}UXn-i)n{a,6}=0 

= (iV-n+l).(iV)L, ^^^^ 5 ^ ' 

{u,v}r\{a,b}=0 Xn--lD{u,v}, 
Xn-in{a,fe}=0 

(n-l)2 (iV- 4)^-3 



(A^-n+l)2 (iV)„_l r 1 r .1 

(n-l)2 1 , . 

y )^ y l-^ {u,i'}n{a,b}=0 



(iV)2 (Af-2)2iV 



((Ar_2)(l-a^-5^)-(a + 6)^). 



Using symmetry, summing over 6 7^ a and multiplying by 2 (since a can 
be chosen as the first or second variable in the set 71) of size 2) yields 

P(a G 7^t, |7^t| = 2) = ^i^-^^_l-^l((Ar- i)(Ar_3) - {N^ --,N)a\ 

By (71), > 3, over which range the factor — (iV^ — 3A^) multiplying a? 
is negative; discarding it yields the upper bound 

Pia G 7^t |7^t| - 2) < 2(n-l)2(iV-l)(iV-3) l _ 2(n - 1)2 
naG/<. |-2j< ^^^^ (iV-2)2 iV"iV2(iV-2)' 

so, by (77), 

(85) E |a|P(a G ^^ |7^t | = 2) < ^^J^as < 2 'as- 
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Inequalities (80), (84) and (85) yield the upper bound on the third term in 
(76), 



(86) E 



Combining the bounds on the four terms of (76) given in (77), (78), (79) 
and (86) gives 



n ^2 



E\V\<2i^l + -j as. 

By (72), EY = 0, so W = Y/a and since W* = (Y/a)* = Y*/a, Theorem 1.1 
gives 

, , , 2E\V\ Aas f n\2 

\\F - $ 1 < 2E\W* -W\ = — < — 1 + — 

a a \ J\ , 

which is (73). □ 

6. Combinatorial central limit theorem. We now use Theorem 1.1 to 
derive bounds for random variables Y of the form 

n 

(87) F = ^a,,,(,), 

1=1 

where vr is a permutation distributed uniformly over the symmetric group 
Sn and {aij}i<ij<n are the components of a matrix A S R"^". Letting 

-j^ n \ ^ \ ^ 

= Z2 II "^i ' = -Yl ^ij and a.j = -J2 "^i ' 

I V . ■ ^ ft.-, It.-, 

t,] = l J = l 2 = 1 

straightforward calculations show that the mean fi and variance cr^ of Y are 
given by 

1 2 \ — ^ / 2 2 2 2 \ 

fi = na.. and a = } ^(a^,- — a^. — a.,- + a..); 

hi 

the fact that (94) below is a probability distribution yields an equivalent 
representation for cj^. 



In what follows, we assume for the sake of nontriviality that cr^ > 0. By (89), 
cr^ = if and only if an — ai, does not depend on i, that is, if and only if the 
difference between any two rows of A is some constant row vector. 
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Motivated by deriving approximating null distributions for permutation 
test statistics, Wald and Wolfowitz [25] proved the central limit theorem as 
n ^ oo for the case where the factorization aij = biCj holds. This was later 
generalized by Hoeffding [12] to arrays {ajj}i<j .j<„ in general. Motoo [17] 
gave Lindeberg-type sufficient conditions for the normal limit to hold. 

In the supremum norm, von Bahr [2] and Ho and Chen [14] obtained 
Berry-Esseen bounds when the matrix A is random, which yield the cor- 
rect rate 0(n~^/^) only under some boundedness conditions. Bolthausen 
[6] obtained a bound of the correct order in terms of third-moment-type 
quantities, but with an unspecified constant. Goldstein [9] gave bounds of 
the correct order under boundedness, but with an explicit constant, for the 
cases where the random permutation vr is uniformly distributed and also 
when its distribution is constant on cycle type. 

For each n, Theorem 6.1 provides an bound between the standardized 
variable Y given in (87) and the normal, with an explicit constant depending 
on the third-moment-type quantity 

n 

(90) «3 = X! I'^v ~ "-i- ~ "-i + a..|^- 

When the elements of A are all of comparable order, cj^ is of order n and as 
of order n^, making the bound below of order n"^/^. 

Theorem 6.1. For n>3, let {ajj}"^^]^ be the components of a matrix 
A G R"^", let 11 he a random permutation uniformly distributed over Sn and 
let Y be given by (87). Then, with fi, o"^ and given in (88) and (90), F the 
distribution function ofW = (Y — fj,) ja and $ that of the standard normal, 

„^ ^„ as / 56 8 \ 
F-ni<- 1—16 + - - + - -2 . 

Proof. Since 

n 

Y - n = Xl(oj,7r(i) - ai- - a.,r(i) + a..), 

i=l 

without loss of generality, we may replace aij by Oij — Oj. — a.j + a., in which 
case 

n n 

(91) ^ aij = ^ aij = 

i=i j=i 

and (90) becomes as = J2ij jajjl^- We will write Y and n interchangeably for 
Y' and vr'. 
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Construction of Y"^ ,Y^: We follow the construction outlined in Section 
3.3; see also [9]. For I < i,j < n, let Tij be the permutation which transposes 
i and j. Given vr', take (/, J) independent of vr', uniformly over all pairs 
1 < / 7^ J < n, that is, with distribution 

(92) p,(ij) = J-l(i^j). 

Now, set vr" = tt'tij and let Y" be given by (87) with vr" replacing tt. In 
particular, vr"(i) = vr'(i) for i ^ {/, J}, so 

(93) y' - Y" = {ajy^) + ajy(^j)) - {ajy(^j) + ajy(^i)). 

We note that the difference depends only on /, J, 7r'(/), 7r'(J) having distri- 
bution pi{i,j)pi{k,l), where k and I are the realizations of tt'^I) and 7r'(J), 
respectively. It can easily be shown (see [9]) that the pair Y' ,Y" is exchange- 
able and satisfies the linearity condition (9) with \ = 2/{n — 1). 

To construct (^+,1"*) with distribution [y' - y"f dP{y' ,y")/ E{Y' - Y"f 
of (36), note first, using (93) and then (35) for the second equality, that 

EiY' - Y"f = ^ ,„ [(«ifc + - («*' + = ^A't' = 

n^n — Ir.'^, n— 1 

noting that the summand is zero ii i= j or k = l. Still following the outline 
given in Section 3.3, to begin the construction of Y'^ and Y^, choose l\ J^, 
K\ L'^ independently of the remaining variables, according to their original 
distribution biased by the difference (93) squared, that is, with distribution 

P. (, k^)= ^^""+;gr^y:)t"^"^^ i (^^ 3)v. (kd 

(94) 

_ [{aik + aji) - {an + Qjfc)]^ . 

An^{n — l)cr^ ' 

in particular, P(/t = jt) = p(i^t = /^t) = q. Now, set 

f ^^.-Hi^t),jt, if Lt = vr(/t),Kt^vr(jt), 

^^ = S ^^W(Ltyt, if Lt/vr(/t),Kt=^(jt), 

[ 7rT^-i(Kt),/tT^-i(Lt),jt, Otherwise, 

and TT^ = 7rtrjt,jt. Note that {7rt(/t), vrt( jt)} = {vr^ (jt), vr^ ( jt)} = {A't, Lt}. 
As the conditional distribution of tt, given that it takes particular values on 
some collection of indices, is uniform over all permutations restricted to take 
those values, the variables Y'^ and Y^ given by (87) with tt replaced by vr^^ 
and TT-I-, respectively, have joint distribution (36). 
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Calculation of E\Y* —Y'\: By Proposition 3.2, 
Y* -Y' = UY^ + (1 - U)Y^ - Y' 

n n n 

1=1 i=l i=l 

With 

T={/t,jt}u{7r-i(irt),^-i(Lt)}, 

we see that if z, j ^ T, then 7r(i) = 7r^(j) = 7r-'-(j). 
Hence, setting V = Y* — Y' , we have 

(95) ^ = "i.'rt© + (1 - ^^)ai,7rt(i) - ai^-K(i))- 

Further, letting 

i2=|{7r(/t),7r(jt)}n{A't,Lt}| 
and Ife = = fc), since < 2) = 1, we have 

V = VI2 + Vli + Vlo and therefore 

(96) 

E\V\<E\V\l2 + E\V\li + E\V\lo. 

The three terms on the right-hand side of (96) give rise to the three compo- 
nents of the bound in the theorem. 

For notational simphcity, the following summations in this section are 
performed over all indices which appear, whether in the summands or in 
a (possibly empty) collection of restrictions. In what follows, we will have 
equalities and bounds such as 

Y] kiil [(flifc + aji) - {an + a^fc)]^ 

(97) 

= I (^ik + «i« + 4 + ^ as- 

Due to the form of the square on the left-hand side, if the factors in a cross 
term agree in their first index, they will have differing second indices and 
likewise if their second indices agree. This gives cross terms which are zero 
by virtue of (91), since they will have at least one unpaired index outside 
the absolute value over which to sum, for instance, the index k in the term 
J2\'^u\^ikCi'U- Hence the equality. The inequality follows from the fact that 
for any choices ti, L2, ki, K2 S {i,j, k, 1} with li / ki and L2 / K2, perhaps by 
relabeling the indices appearing after the inequality, 

/ \ 1/3 / \ 2/3 

(98) E l"^iAi|aL«2< El«»jf El^fc'H =n^a3. 

i,j,k,l \ k,l / \ i,j J 



BOUNDS IN NORMAL APPROXIMATION 37 

Generally, the power of n in such an inequality, in this case 2, will be 2 less 
than the number of indices of summation, in this case 4. 

Decomposition on R = 2: On I2, X= {I^,J^}. As the intersection which 
gives R = 2 can occur in two different ways, we make the further decompo- 
sition 

yi2 = yi2,l + 1^12,2, 

where l2,i = = /^^v^(Jt) = L^) and 12,2 = l(vr(I"^) = Lt,7r(jt) = 

K^). Since vr"!" = vr on l2,i, by (95), 

^l2,i= 51 (f^«i,7rt(i) + (1 - f^)ai,^t(i) -ai,^(j))l2,i 

iG{/t,Jt} 
= [t^(a/t,7rt(7t) +ajt,7rt(Jt)) 

+ (1 - t/)(ajt,7r}(/t) + «Jt,7rt(Jt)) - («/t,7r(/t) + O Jt Jt ) )] l2,l 

(99) = [f/(ajt,^(/t) + ajt,^(jt)) 

+ (1 - t/)(Ojt,,r(Jt) + «Jt,7r(/t)) - (a/t,,r(7t) + ajt,,r(Jt))] l2,l 
= (1 - ?7)(ajt,,r(Jt) + «Jt,7r(/t) - «/t,,r(7t) " « Jt ,7r( Jt ) ) l2,l 

= (1 - C/)(ajt,Lt + ajt,xt - a/t,xt - ajt,Lt)l2,i- 

Due to the presence of the indicator l2,i, taking the expectation of (99) 
requires a joint distribution which includes the values taken on by vr at 
and J^, say s and t, respectively. Since s and t can be any two distinct values 
and are independent of l\ J\K^ and L^^, we have, with pi and p2 given in 
(92) and (94), respectively, 

P3{i,j,k,l,s,t) 

(100) = P((/t, Jt, A't, Lt, 7r(/t), 7r( Jt)) = {i,j, k, I, s, t)) 

= P2{^,J, k, Dp, [s, t) = t3fn"- 1)C^"''^^' 
Now, bounding the absolute value of the first term in (99) using (97), 
^1(1 - C/)ajt,i,t|l2,i = ^'Y\aii\l{s = k,t = l)p3{i,j,k,l,s,t) 
= ^'Y\0'ii\P3{hj,k,l,k,l) 

" 8n3(n- 1)V2 ^ [("*^ + "i') " + ""jk)? 

< ^ 

- 2n(n-l)2cj2 
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Using the triangle inequality in (99) and applying the same reasoning to the 
remaining three terms shows that £'|l/|l2,i < 2a3/(n(n — 1)^0"^); since, by 
symmetry, the term V'\.2,2 can be bounded in this same way, we obtain 

101 E\V\l2<-, 

Decomposition on R=\: As the event R=l can occur in four different 
ways, depending on which element of {7r(/^), 7r( J^^)} equals an element of 
{K\L'^}, we decompose li to yield 

(102) Vli = + yii,2 + l^li,3 + 1^1i,4, 

where = l(7r(/t) = and 7r(jt) /Lt), on which T = {/t, jt, 7r-i(Lt)}, 
specifying the remaining three indicators in (102) similarly. Now, using (95), 
and the fact that on li^i, we have tt'^ = 7rr^-i(it),jt and vr-l- = '?rTj^-i(j;^t),jtT'jt,/t; 
so that 7rt(7r~i(L)) = 7r^(7r^i(L)) = 7r( J) it follows that 

= H (f^ ai^^t(i) + (1 - U)ai^^t(i) - ai,^(j))li,i 

iG{/t,Jt,7r-l(Lt)} 
= [^("/t,,rt(/t) + «Jt,,rt(Jt) + a7r-l(Lt),7rt(7r-l(Lt))) 

+ (1 - C/)(ajt,7rt(/t) + ajt,7rt(Jt) + «7r-l(Lt),^}(,r-l(i"l'))) 

(103) - (a/t,7r(/t) + ajt_^(Jt) + a^-l(Lt),7r(7r-l(Lt)))]ll,l 
= [t^(«/t,Kt +«Jt,Lt +«7r-l(i^1'),vr(Jt)) 

+ (1 - C/)(ajt,Lt + ajt,ift + a7r~l(Lt),7r(jt)) 

- («/t,ii-t +«Jt,,r(Jt) +«7r-l(Lt),Lt)]ll,l 

= [t^Ojt^Lt + (1 - ^)(«/t,Lt + «jt,/^t - a/t,i<-t) 

~ «Jt,7r(Jt) - «7r-l(Lt),Lt + «7r-l(i-"l' ),7r( Jt )] ll,l ■ 

For the first term in (103), dropping the restriction t ^ I and summing 
over t to obtain the first inequality and then applying (97) with \aii\ replaced 
by \aji\, we obtain 

-^f^l"jt,Lt|li,i = ]^'^\a'jl\^s = k,ti^l)pz{i,j,k,l,s,t) 

(104) < g^2(^i 1)2^2 Mli'^ik + aji) - (an + ajk)? 
< ^ 

- 2(n- 1)2(72 
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The second, third and fourth terms in (103) result in the bound (104), 
with \aji\ replaced by |aj;|,|ajfc| and \aii^\, respectively, and applying corre- 
sponding forms of (97) on each gives 

2a3 

(105) E\Uaji^Li + (1 - f^)(a/t,Lt + ajt,i<-t - ajt,Kt)|li,i < 



(n-l)2,T2- 



For the fifth term in (103), involving ajt 7r(jt) without a uniform variable 
factor, we obtain 

^|ajt,^(jt)|li,i = = k,t^ l)p3{i,j,k,l,s,t) 

(106) < 4^3(^i 1)2^2 E [("ifc + "i') - + "ifc)]^ 

< «3 



(n-l)2cT2' 

Note that for the final inequality, although the sum being bounded is not 
of the form (97), having the index t, the same reasoning applies and that, 
moreover, the five indices of summation require that n? be replaced by 
in (98). 

To handle the sixth term in (103), involving a7r-i(Lt),Lt j we need the joint 
distribution 

P4{i,j,k,l,s,t,u) 

= P((/t,jt,Kt,Lt,7r(/t),vr(jt),7r-i(Lt)) = (i,i,fc,/,s,t,n)), 

accounting for the value u taken on by tt^^{L'^). If I equals s or t, then u is 
already fixed at i or j, respectively; otherwise, tt~^{L^) is free to take any 
of the remaining available n — 2 values, with equal probability. Hence, with 
P3 given by (100), we deduce that 

{P3{i,j,k,l,s,t), if (/,n) G {(s,i),(t,i)}, 

P3{i,j,k,l,s,t) -, if l^{s,t} and u^{i,j}, 
n — 2 

0, otherwise. 

Note, for example, that on li^i, where it{I^) = and 7r(J'l") / L"!", the value 
u of TT~^[L^) is neither nor j\ so the second case above is the relevant one 
and the vanishing of the first sum on the third line of the following display 
is to be expected. 

Now, calculating using the density p4, for the sixth term in (103), we have 

-^|a7r-i{Lt),Lt|ll,l 

= \aui\l{s = k,t^ l)pi{i,j,k,l,s,t,u) 



40 L. GOLDSTEIN 

= '^\aui\p4{i,j,k,l,k,t,u) 

= ^\aik\P3ii,j,k,k,k,t) + ^ \aui\p3ii,j,k,l,k,t) 

i<^{k,t},i4{i,j} 

= —^ Y] \aui\p2{i,j,k,l)pi{k,t) 
(107) =-A- \aui\P2{i,j,k,l) 
Yl Wui\P2{i,j,k,l) 



(n)2 1 



< — :r-^ — r.^^ |a„/|[(ajfc + Qji) - {au + ajk)f 



(108) < 



4:n^{n — 1)2(t2 
as 



(n-l)V2' 

where the final inequahty is achieved using (97) in the same way as for (106). 

The computation for the seventh term in (103) begins in the same way 
as that for the sixth, yielding (107) with aut replacing aui, so that 

^|a7r-l(Lt),7r(Jt)|ll,l 

(109) Y \aut\p2{i,j,k,l) 

- 'Jr~\ — 27 TTT Y Wut\[{aik + aji) - (an + ajk)f 

4(n)3n2(n — l)cr2 ^ v j 



< 



(n)3(?i - l)cj2 
3a3 



(n- 1)2^2' 

where we have applied reasoning as in (97) and replaced n'^ by in (98) 
due to the sum over six indices. 

Returning to (103) and adding the contribution (105) from the first four 
terms together with (106), (108) and (109) from the fifth, sixth and seventh, 
respectively, we obtain < 7az/{{n — l)'^a'^). Since, by symmetry, all 

four terms on the right-hand side of (102) can be handled in the same way 
as the first, we obtain the following bound on the event R=l: 

(110) E|y|li<-^^. 
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Decomposition on R = 0: We have 

lo = l(7r(/t) ^ {KlL^},7r{J^) ^ {K^,L^}), 

and, from (95), 

= (t^«i,7rt(i) + (1 - t^)«i,7rt(i) - «j,7r(i))l0 

je{/t,Jt,,r-l(Kt),7r-l{Lt)} 

= [f^(a/t,itt + Ojt,Lt) + (1 - ^)(«/t,Lt +ajt,E-t) 

(111) +a^-l(/<t),7r(/t) +a7r-l(it),7r(,/t) 

- (a/t,,r(7t) + «Jt,7r(Jt) + «7r-i(Xt),E't + «7r-i (Lt ),Lt )] Iq • 

Since the first four terms in (111) have the same distribution, we bound 
their contribution to i?|y|lo, using (97), by 

4EU\aji^j^^\lo<4EU\ajt^K^\ 

= 2y2Wik\P2{i,j,k,l) 

(112) ^ 

~ TrT? T\~T Wik\[{C'ik + 0.jl) — {an + ajk)f 

Zn'^yn — Ijcj^ 
< 2a3 



(n-l)a2' 

The sum of the contributions from the fifth and sixth terms of (111) can 
be bounded as 

2£;|a^-i(^t),,r(jt)|lo 

= 2 ^ \aut\Pi{i,j,k,l,s,t,u) 

s^{k,l},t^{k,l} 

2 ^ 

= ^ 2^ \aut\P3{hj,k,l,s,t) 

ft — 2i 

s(^{k,l},t^{k,l},uf{i,j},s^t 

(113) < _ ^^2^2 E \^ut\[{a^k + QjO " («»; + a,fc)]' 



< 



2(n-2)n3(n-l)2 

2n(?i — 3)03 
(n-2)(?i- 1)V2 

203 



(n — Ijo"^ 

where inequahty (113) is obtained by summing over the n — 3 choices of 
s and dropping the remaining restrictions, and the next by following the 
reasoning of (97). 
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For the sum of the contributions from the seventh and eighth terms of 
(111), summing over the n — 3 choices of t and then dropping the remaining 
restrictions to obtain the first inequahty, we have 

2-E^|a7t,,r(/t)|lo = 2 \ais\P3{hj,f^,l,s,t) 

s(^{k,l},t(^{k,l} 



(115) 



si{k,l},ti^{k,l},s^t 

- o s/'^ — Tyr^y^Wisliiaik + aji) - {aii + ajk)f 
^ 2(re - 3)03 ^ 2a3 



(„_ 1)2^2 - (,j_i)^2- 

The total contribution of the ninth and tenth terms together can be 
bounded hke the sum of the fifth and sixth, yielding (113) with \aui\ re- 
placing \aut\, then summing over the n choices of t gives 

2-E'|a7r-i(it),Lt|lo 

(116) - 2{n - 2)n2(n - 1)2^2 ^ \(^ni\[{aik + «j7) - {au + ajk)? 

^ 2n{n - 3)03 ^ 2a3 



(n-2)(n- l)2fj2 - (n- l)a2' 

Adding up the bounds for the first four terms (112), the fifth and sixth 
terms (114), the seventh and eighth terms (115) and the ninth and tenth 
terms (116) yields 

(117) i^|y|lo<— 

Since W* = {Y/a)* = Y*/a, we have E\W* -W\ = E\V\/a. Hence, sum- 
ming the R = 2, R = \ and R = contributions to E\V\ given in (101), (110) 
and (117), respectively, the proof of the theorem is completed by applying 
Theorem 1.1. □ 



7. Remarks. In Section 3.2, a new method of constructing zero bias cou- 
plings is presented which closely parallels the construction for size bias cou- 
plings. Applying also an existing construction, the zero bias method for 
computing bounds to the normal is illustrated in four situations. 

The zero bias transformation for normal approximation is not restricted 
to the norm. The supremum norm is considered in [9] through the use 
of smoothing inequalities, although useful bounds there are only obtained 
when \Y* —Y\ can be almost surely bounded by a quantity small relative 
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to Var(y). This restriction at present prevents the apphcation of the zero 
bias method from computing supremum norm bounds in various examples, 
cone measure being one. It is hoped that this restriction may be relaxed in 
future work. 

The Stein equation also presents the possibility for deriving total variation 
bounds in a way similar to the manner in which the bounds used here were 
derived in [8]. Letting a random variable denote its own distribution, recall 
that the total variation distance between the distributions of X and Y can 
be defined in terms of differences in expectations over bounded measurable 
test functions h: 

(118) \\X - Y\\ty = \ sup \Eh{X) - Eh{J)\. 

\h\<\ 

Now, consider the Stein equation, with = 1, say, for such an /i, 

j\x)-xj{x) = h{x)-Eh[Z), 

where Z is a standard normal variable. Stein [23] shows that if \h\ < 1, then 
/ is differentiable with |/'| < 2 and hence, for a mean zero variance 1 random 
variable W, 

\Eh{W) - Eh{Z)\ = \Ef'{W) - EWf{W)\ 

= \Ef'{W)- Ef'{W*)\<A\\W -W*\\t^w 
Dividing by 2 and taking supremum over h as indicated in (118) yields 

||VF-Z||tv<2||T^-VF*||tv, 
a total variation bound parallel to the L} bound in Theorem 1.1. 
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